Visualizing textual models with in-text and word-as-pixel highlighting 0.2in [width=7in]ldagraphic/ldagraphic.pdf figureA topic model's token-level posterior memberships P(zt|wt) shown as in-text annotation (§3) and word-as-pixel (§4) views, from a corpus of U.S. presidential State of the Union speeches. Speeches are concatenated, running in columns; top-left is 1946, bottom right is 2007. (This version shows a sample of tokens.) Demo: ` `%%%`#`&12_`__~~~rue
نویسندگان
چکیده
We explore two techniques which use color to make sense of statistical text models. One method uses in-text annotations to illustrate a model’s view of particular tokens in particular documents. Another uses a high-level, “wordsas-pixels” graphic to display an entire corpus. Together, these methods offer both zoomed-in and zoomed-out perspectives into a model’s understanding of text. We show how these interconnected methods help diagnose a classifier’s poor performance on Twitter slang, and make sense of a topic model on historical political texts. 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY, USA. Copyright by the author(s).
منابع مشابه
Material Development and English for Academic Purposes Word Lists; a Reductionist Approach
Nagy (1988) states that vocabulary is a prerequisite factor in comprehension. Drawing upon a reductionist approach and having in mind the prospects for material development, this study aimed at creating an English for Academic Purposes Word List (EAPWL). The corpus of this study was compiled from a corpus containing 6479 pages of texts, 2,081,678 million tokens (running words) and 63825 types (...
متن کاملPAYMA: A Tagged Corpus of Persian Named Entities
The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...
متن کاملINDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...
متن کاملINDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...
متن کاملاستفاده از مدل جاذبه برای استخراج انحنای مرز دریاچه سد
Introduction The attraction model algorithm spatially depends on the neighborhoods of the central pixels that are attracting surrounding sub-pixels. Another possibility is the hypothesis of subpixel interaction as introduced by Mertens et al. (2003) and Atkinson (2005). In order to reach a pixel state with the maximum number of sub-pixels of identical classes neighboring, there are several met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016